IDA: A System for Automated Sorting, Indexing, and Classification of Documents
نویسندگان
چکیده
IDA (Intelligent Document Analysis) is a modular software system, which assists to automate paper document entry. IDA consists of the following components: layout analysis, preclassification, OCR interface, fuzzy string matching, text categorization, lexical, syntactical and semantic analysis. The system has been applied to a variety of tasks: Presorting of forms, reports and letters, index extraction for archiving and retrieval, text column analysis in real estate register documents, in-house mail distribution, and classification of business letters by text content. This paper presents an overview of the architecture and applications of the system.
منابع مشابه
Text Categorization
Text categorization (also known as text classification, or topic spotting) is the task of automatically sorting a set of documents into categories from a predefined set. This task has several applications, including automated indexing of scientific articles according to predefined thesauri of technical terms, filing patents into patent directories, selective dissemination of information to info...
متن کاملAutomatic Analysis and indexing of variable-layout documents
In this paper a methodology for analysis and automatic indexing of imaged documents within an archiving and retrieval system is described. This system, which is being developed within the Esprit project STRETCH (STorage and RETrieval by Content of imaged documents), is based on a new generation Archiving and Retrieval Engine (ARE), which overcomes the bottleneck of document profiling by allevia...
متن کاملAn automated approach to analysis and classification of Crypto-ransomwares’ family
There is no doubt that malicious programs are one of the permanent threats to computer systems. Malicious programs distract the normal process of computer systems to apply their roguish purposes. Meanwhile, there is also a type of malware known as the ransomware that limits victims to access their computer system either by encrypting the victimchr('39')s files or by locking the system. Despite ...
متن کاملارتقای کیفیت دستهبندی متون با استفاده از کمیته دستهبند دو سطحی
Nowadays, the automated text classification has witnessed special importance due to the increasing availability of documents in digital form and ensuing need to organize them. Although this problem is in the Information Retrieval (IR) field, the dominant approach is based on machine learning techniques. Approaches based on classifier committees have shown a better performance than the others. I...
متن کاملInvestigating the Adoption Rate of Students' Mental Model with the Structure of the Learning Management System of the University of Tehran by Card Sorting Method
Background and Aim: E-learning is an important topic in the educational settings and students are significant prerequisites of it, who have an essential role for the acceptance and effective use of e-learning management systems so that knowing their attitudes and mental models is essential for the successful implementation of such a method. Therefore, the aim of this study was to investigate...
متن کامل